Exploring the Performance Potential of Itanium® Processors with ILP-based Scheduling

نویسنده

Sebastian Winkel

چکیده

HP and Intel’s Itanium Processor Family (IPF) is considered as one of the most challenging processor architectures to generate code for. During global instruction scheduling, the compiler must balance the use of strongly interdependent techniques like code motion, speculation and predication. A too conservative application of these features can lead to empty execution slots, contrary to the EPIC philosophy. But overuse can cause resource shortage which spoils the benefit. We tackle this problem using integer linear programming (ILP), a proven standard optimization method. Our ILP model comprises global, partial-ready code motion with automated generation of compensation code as well as vital IPF features like control / data speculation and predication. The ILP approach can – with some restrictions – resolve the interdependences between these decisions and deliver the global optimum. This promises a speedup for compute-intensive applications as well as some theoretically funded insights into the potential of the architecture. Experiments with several hot functions from the SPEC benchmarks show substantial improvements: Our postpass optimizer reduces the schedule lengths produced by Intel’s compiler by about 20-40%. The resulting speedup of these routines is 16% on average.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Global Instruction Scheduling for the Itanium

On the Itanium 2 processor, effective global instruction scheduling is crucial to high performance. At the same time, it poses a challenge to the compiler: This code generation subtask involves strongly interdependent decisions and complex trade-offs that are difficult to cope with for heuristics. We tackle this NP-complete problem with integer linear programming (ILP), a search-based method th...

متن کامل

An efficient memory operations optimization technique for vector loops on Itanium 2 processors

To keep up with a large degree of instruction level parallelism (ILP), the Itanium 2 cache systems use a complex organization scheme: load/store queues, banking and interleaving. In this paper, we study the impact of these cache systems on memory instructions scheduling. We demonstrate that, if no care is taken at compile time, the non-precise memory disambiguation mechanism and the banking str...

متن کامل

A New ILP Model for Identical Parallel-Machine Scheduling with Family Setup Times Minimizing the Total Weighted Flow Time by a Genetic Algorithm

This paper presents a novel, integer-linear programming (ILP) model for an identical parallel-machine scheduling problem with family setup times that minimizes the total weighted flow time (TWFT). Some researchers have addressed parallel-machine scheduling problems in the literature over the last three decades. However, the existing studies have been limited to the research of independent jobs,...

متن کامل

Dynamic Profile Driven Code Version Selection

In this paper, we study the effectiveness of dynamic code version selection on Itanium R © 2 processors. Code version selection can improve the effectiveness of optimizations, adapting them to multiple input sets. In this performance potential study, we conduct experiments on dual-core Itanium R © 2 processors and examine the effectiveness of dynamic code version selection of loop scheduling, l...

متن کامل

CSE231 project report —- survey on instruction scheduling

This paper surveys past research on instruction scheduling for exploiting more Instruction Level Parallelism (ILP). We focus on static instruction scheduling performed by compiler. The hardware platform for implementing such compiler techniques, i.e. VLIW is also reviewed. We also give comparison between the code scheduling done dynamically by out-of-order machines and that by compilers, along ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Exploring the Performance Potential of Itanium® Processors with ILP-based Scheduling

نویسنده

چکیده

منابع مشابه

Optimal Global Instruction Scheduling for the Itanium

An efficient memory operations optimization technique for vector loops on Itanium 2 processors

A New ILP Model for Identical Parallel-Machine Scheduling with Family Setup Times Minimizing the Total Weighted Flow Time by a Genetic Algorithm

Dynamic Profile Driven Code Version Selection

CSE231 project report —- survey on instruction scheduling

عنوان ژورنال:

اشتراک گذاری